home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Software Vault: The Gold Collection
/
Software Vault - The Gold Collection (American Databankers) (1993).ISO
/
cdr48
/
coun20.zip
/
COUN.DOC
next >
Wrap
Text File
|
1993-04-04
|
13KB
|
295 lines
Turbo Pascal Record Compress Procedure
Carl A Franz
JFL Consulting
"We will sell no software
before it's written"
1115 S. Ridgeland
Oak Park, Il. 60304
(708) 383-1546
CServe: 71041,1512
When UnZiping the COUN.ZIP file you should have received:
1) COUN.PAS - The Compress/Uncompress source.
2) TESTCOUN.PAS - Demonstration program for COUN.
3) COUN.DOC - This file.
Quite frankly, this is only useful if you have a database like
BTrieve or TBTree which allows you to have variable length records in a
database.
If you can't afford BTrieve (I can't), try TBTree written by a guy
named Dean Farwell 73240,3335. I get nothing from Dean to plug his
product, so my opinion of this product is untainted. It's great. You
can put up a well designed database with Turbo Pascal and the TBTree
product. It's much better then the Borland Database toolkit. I think
it's about $25 now. A heck of a bang for your buck.
Anyway, on to this product. The routines in COUN compress out
space in your records by removing the extra space in the STRING
variables. For instance, if you have a record for an address book like
the following:
AddrBook = Record
NAME : String[40];
ADDR1 : String[40];
ADDR2 : String[40];
City : String[25];
St : String[2];
Zip : String[9];
You have allocated 162 bytes, however, rarely is all that space used
for actual data. For instance, my name and address use all of 64
bytes. That's a lot of wasted space. Since I am, in fact, building an
address book of sorts, and I am planning on keeping several record
types in one file, I figured I needed to save some space. Thus I wrote
these Compress/Uncompress routines for Turbo Pascal Records.
How it works:
There are 2 routines.
FUNCTION Compress(CMap : STRING; VAR InData; VAR OutData) : INTEGER;
This function accepts a map of your Pascal record (CMap), your
record (InData), and someplace you want the compressed record
information to go (OutData). I highly suggest that the field you use
for OutData be a byte array as large as the Record you are
compressing. The function then returns the length of the compressed
record.
PROCEDURE UnCompress(CMap : STRING; VAR InData; VAR OutData);
This procedure accepts a map of the record (CMap), your compressed
byte array (Indata), and your record (OutData). I had been
considering swapping positions of InData and OutData so that the
calling conventions are the same for COMPRESS and UNCOMPRESS but
didn't. If you want to, go ahead, you've got the source code.
CMap, the record map is the most complicated part of this mess. To
compress and uncompress you record, I need to know what it looks like.
To do this is fairly simple. I use the word 'fairly' advisedly.
Referring to the Address Book record, the CMap would be
'S40S40S40S25S2S9'. You should get an idea from that. Basically, you
tell me, in short hand, what the fields in the record are. To wit:
I = INTEGER; 2bytes (Case is irrelevant)
L = LONGINT; 4
R = REAL; 6
B = BYTE; 1
S = STRING;
C = CHAR; 1
P = POINTER; 4
W = WORD; 2
Types not supported are: enumerated type, single, double, or
comp floating-point types, and set
types.
'S' may have a length behind it to define the declared length of
the string: ie. STRING[40] is 'S40'. If there is no length following
the string identifier 'S', I assume the length is 255 bytes, the
length of a string defined STRING.
A number may be used to define a length of data. If you have 5
byte fields in a row, you can either have them defined as 'BBBBB' or
'5'. Likewise, if a record contains 2 Integers and a pointer you may
define them as 'IIP' or '8'. If you have a STRING[40] followed by 5
byte fields, you must separate with a comma (','), i.e. 's40,5'. Lets
face it 'S405' makes no sense. Also, an 'S' followed by a number that
is not the strings length must be seporated by a comma.
IE. if you have a field defined STRING followed by 5 BYTE fields the
'S5' would be assumed to be a 5 byte string, 'S,5' is a 255 byte
string followed by 5 bytes of whatever.
So, you say you've got arrays. I can handle that. Lets say you'd
defined a record thus:
Rec = Record
StrArray : array [1..25] string[40];
No problem. Arrays can be defined by brackets ('[',']'). A left
bracket '[' followed by the number of items in the array starts an
array definition and an right bracket ']' ends it. To Wit: '[25s40]'
defines an array of 25 40 byte strings (Array [1..25] STRING[40]).
Arrays can also be nested up to 100 levels deep. Actually, I've
allowed for 100 levels in my tables but realistically you may have
only 100 symbols of any kind in the CMap string. If you find a need
to expand the limits, go ahead. The type definitions L1 and L2 are
where to change them. These are the Cmap parse tables.
There are two fields for flagging errors:
1) COUNERR an integer where:
1 is a memory allocation error.
2 is a invalid pnumonic error.
(I don't recognize a record map token character)
COUNWHR tells you the character position.
3 is a bracket mismatch error.
4 Cmap is too big.
2) COUNWHR an integer field defines the CMap string that
caused the trouble.
There are a several limits as to what I allow. Records can only
be 32000 bytes long. Also, like I said above, the maximum CMAP length
is 100 characters. Multi-dimentional arrays are not supported. Oh,
you can do it by defining a nested array, but I wouldn't try to define
a multi-dimentional array which contains strings unless you really under-
stand how Turbo Pascal allocates memory.
A note about the previous paragraph: There are no good reasons for
most of the limitations. I just didn't need anything bigger. If,
however, you do deceide to make the Byte Array bigger, there are Turbo
Pascal limitations. Integers go to +32K so indexes need to be changed
to LongInt or Word. I'm not sure how big arrays can be, but there is
a limit, look it up. Also, the obvious limit to the CMAP is 255
characters. If you come up with any interesting ways around that, let
me know.
When compiled the COUN.PAS unit uses 2264 bytes of code and 53
bytes of data space. If there is enough interest in this (or if Dean
Farwell askes me to) I will convert this to TASM assembler. It should
then be faster and smaller. If someone else wants to do it, that's fine
also. Please send me the code when your done.
The source code is provided for several reasons. 1) I like to see
what other people are doing, I assume others do too. 2) If someone
comes up with nifty a way of making these routines faster, smaller,
more elegant, whatever, I would like to know.
If you use these routines, I don't want money. Well, yes I do. If
you feel like sending me a fiver, go ahead. What I really want is to
know if anyone finds them useful. Drop me a note. I enjoy chatting
with others in the field.
If you, God forbid, find any bugs in these routines, please let me
know. I will fix them and get a new version out to you ASAP. I'm
very proud of my work, so I really do try hard to provide the best
time will allow. Also, try fixing them yourself, it's good practice.
I have a 20 month old child in the house so no late night calls.
Anything after 10pm CST and I'll probably get quite angry. You're much
more likely to get me via CompuServe then calling by phone. But if
you must, evenings and weekends are the best time. I do not, under any
circumstances, accept collect calls. Deal with it.
Biography: (I saw it in someone elses doc and thought it was a good
idea)
Carl Franz has been in programming for 13 years. He has written
code professionally for Univac, Burroughs, DEC, IBM Mainframe, Z80
CP/M, and IBM PC. Currently I'm a Technical Advisor for a commercial
bank. I consult on the side when the mood hits me. The JFL in JFL
Consulting stands for 'Just For Laughs' (not really, but you get the
point). Need a utility written, give me a buzz, if it sounds like fun
we can work something out.
Yet again I'm going to plug TBTree. The next version will provide
Network support. It already provides fixed and variable length record
support, record lists, keys of Turbo Pascal variable types, so-so
documentation but good example programs. Last I looked, it was in
BPROGA Lib 2. It's a big download (about 300K) but worth it. All
source code provided. And, for goodness sake, pay the man his $25, it
isn't alot for what you are getting and he needs to know if anyone is
really using the product. On top of which, as far as I can tell it's
bugless.
Good luck and may the farce be with you.
For Algorythm Freaks
The algorythm for the process is kind of brainless. (Brainless
means 'Why didn't I think of that earlier'). Basicly, there are 2
tables: 1) L1 absorbes all necessary information about the Tokens
in the CMAP table, 2) L2 allow me to stack Array-Start information to
handle nested arrays.
At its vary basics there are 6 token types: 1) 'S' or string with
an optional length; 2) scaler lengths (numeric values); 3) any of the
rest of the pnumonics which refer to Pascal Types; 4) The start array
left bracket '[' plus iteration value; 5) the end array right bracket
']'; and 6) the lowly comma.
ParseCMap calls GetToken at the start of each loop. GetToken
looks at the next value in CMap and loads LP1T with the token type,
whatever the character is, and a length. The length all Pascal Types
is gotten via a SizeOf, except String (S) for which GetNum is called
to check if there is a numeric character after the 'S'. If there is a
numeric character, it absorbs characters from CMap until a non-numeric
value is found converting the mess into an integer. LP1T is later
copied to the next item in the L1 table.
The '[' or Start-Array does something a little different. For the
most part it works the same as the String 'S' token. Except, when one
is found an entry is made onto stack L2. The entry consistes of the
index value of where the '[' entry is in L1. As '['s are found, each
is pushed onto the stack. When an Array-End ']' token is found, an
entry in the L2 stack is poped. This entry contains the index
location of the matching Start-Array. The Size component of L1 is
then loaded with the location of the matching Start-Array so that when
they are finally processed you will know which entry of the L1 table
to return to for iteration.
I have to apologize about the naming conventions. I was rereading
some notes on expression parsing and evaluation from college which
used the same stupidly cryptic conventions. I wasn't feeling
particularly creative at 2:30am so I used them instead of making up
better ones.
On the Compress/DeCompress side you step thru the L1 array and do
what it says. Except. When a Start-Array is found the iteration
count is moved from Size to Decr. Then, upon seeing an End-Array
the Decr of the matching Start-Array is checked: If 0 then nothing is
done and processing continues to the next item; else if Decr is not
zero it is decremented and the index address of the matching
Start-Array is loaded to the L1 index. Remember that the L1 index
will be incrimented before checking the next L1 entry so the
Start-Array will not actually be processed again during the 'array
loop'. Also, each Start-Array has its own Decr, thus nested array
will process properly.
There is a slightly more effecient way of handling the Array loops,
however it involves another integer in L1 and some somewhat more
complicated code. Also, I have a blind spot figuring out where I
should be with indexes. I'm alway one ahead or behind where I should
be.